A Computational Grammar for Georgian
نویسنده
چکیده
In this paper, I give an overview of an ongoing project which aims at building a full-scale computational grammar for Georgian in the Lexical Functional Grammar framework and try to illustrate both practical and theoretical aspects of grammar development. The rich and complex morphology of the language is a major challenge when building a computational grammar for Georgian that is meant to be more than a toy system. I discuss my treatment of the morphology and show how morphology interfaces with syntax. I then illustrate how some of the main syntactic constructions of the language are implemented in the grammar. Finally, I present the indispensable tools that are used in developing the grammar system: fst; the xle parsing platform, the LFG Parsebanker, and a large searchable corpus of non-fiction and fiction texts.
منابع مشابه
A Finite-State Model of Georgian Verbal Morphology
Georgian is a less commonly studied language with complex, non-concatenative verbal morphology. We present a computational model for generation and recognition of Georgian verb conjugations, relying on the analysis of Georgian verb structure as a word-level template. The model combines a set of finite-state transducers with a default inheritance mechanism.1
متن کاملPolynomial Pregroup Grammars parse Context Sensitive Languages
Pregroup grammars with a possibly infinite number of lexical entries are polynomial if the length of type assignments for sentences is a polynomial in the number of words. Polynomial pregroup grammars are shown to generate the standard mildly context sensitive formal languages as well as some context sensitive natural language fragments of Dutch, SwissGerman or Old Georgian. A polynomial recogn...
متن کاملSemilinearity as a Syntactic Invariant
Mildly context sensitive grammar formalisms such as multi-component TAGs and linear context free rewrite systems have been introduced to capture the full complexity of natural languages. We show that, in a formal sense, Old Georgian can be taken to provide an example of a non-semilinear language. This implies that none of the aforementioned grammar formalisms is strong enough to generate this l...
متن کاملIntonational Phonology of Georgian
This paper proposes a prosodic structure and the tonal pattern of Georgian, the national language of Georgia. The language has three prosodic units above the Word: Intonation Phrase (IP), Intermediate Phrase (ip), and Accentual Phrase (AP). All these units are marked by a boundary tone, but an AP in Georgian is unique typologically in that it has pitch accent linked to a stressed syllable and p...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کامل